4 research outputs found

    Accelerating overlapping community detection: Performance tuning a stochastic gradient Markov chain Monte Carlo algorithm

    No full text
    Building efficient algorithms for data-intensive problems requires deep analysis of data access patterns. Random data access patterns exacerbate this process. In this paper, we discuss accelerating a randomized data-intensive machine learning algorithm using multi-core CPUs and several types of GPUs. A thorough analysis of the algorithm’s data dependencies enabled a 75% reduction in its memory footprint. We created custom compute kernels via code generation to identify the optimal set of data placement and computational optimizations per compute device. An empirical evaluation shows up to 245x speedup compared to an optimized sequential version. Another result from this evaluation is that achieving peak performance does not always match intuition: e.g., depending on the GPU architecture, vectorization may increase or hamper performance

    ConPaaS: an Integrated Runtime Environment for Elastic Cloud Applications

    No full text
    Most Cloud applications are re-enactments of traditional enterprise applications such as Web applications, content delivery and e-commerce [1]. The advantages of the Cloud are well-known: access to a near-infinite number of resources
    corecore